I usually process simulation results in batch, which is generally associated with pathname manipulations. In this article, I take notes of pathname manipulations from my programming experiences.
1. Overview
1.1 Terms related to pathname
The descriptions of the terms related to pathname are given below, excerpting from Wikipedia.
pathname
A path, the general form of the name of a file or directory, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent each directory.
dirname
dirname
is a standard UNIX computer program.dirname
will retrieve the directory-path name from a pathname ignoring any trailing slashes.
basename
basename
is a standard UNIX computer program.basename
will retrieve the last name from a pathname ignoring any trailing slashes.
Note that the result of os.path.basename(path)
is different from the Unix basename program where basename for '/foo/bar/'
returns 'bar'
, the os.path.basename
function returns an empty string ''
.
filename
Sometimes "filename" is used to mean the entire name, such as the Windows name
c:\directory\myfile.txt
. Sometimes, it will be used to refer to the components, so the filename in this case would bemyfile.txt
. Sometimes, it is a reference that excludes an extension, so the filename would be justmyfile
.
filename extension
A filename extension (such as
txt
) is an identifier specified as a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A file extension is typically delimited from the filename with a full stop (period).
1.2 Python modules
Three of the most commonly used Python modules for pathname manipulations are listed below.
Common pathname manipulations. This module implements some useful functions on pathnames. To read or write files see open(), and for accessing the filesystem see the os module. The path parameters can be passed as either strings, or bytes.
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.
Object-oriented filesystem paths. This module offers classes representing filesystem paths with semantics appropriate for different operating systems.
Path classes are divided between pure paths (which provide path-handling operations which don’t actually access a filesystem.) and concrete paths (which inherit from pure paths but also provide methods to do system calls on path objects.)
2. os.path
2.1 split and join
A full file path (e.g., /Users/sparkandshine/Documents/main.py
) is composed of two components, which are,
- directory name (
/Users/sparkandshine/Documents
in this case). This is the first element of the pair returned byos.path.split(path)
. - base name (
main.py
in this case). This is the second element of the pair returned byos.path.split(path)
.
os.path.split(path)
splits the pathname path into a pair, (head, tail)
where tail
is the last pathname component and head
is everything leading up to that.
If path ends in a slash,
tail
will be empty. If there is no slash in path,head
will be empty. Ifpath
is empty, bothhead
andtail
are empty. The tail part will never contain a slash. Trailing slashes are stripped from head unless it is the root (one or more slashes only).
import os
>>> os.path.split('/Users/sparkandshine/Documents/main.py')
('/Users/sparkandshine/Documents', 'main.py')
>>> os.path.split('/Users/sparkandshine/Documents')
('/Users/sparkandshine', 'Documents')
>>> os.path.split('/Users/sparkandshine/Documents/') # path ends in a slash
('/Users/sparkandshine/Documents', '')
>>> os.path.split('main.py') # no slash in path
('', 'main.py')
>>> os.path.split('') # path is empty
('', '')
>>> os.path.split('/') # root
('/', '')
os.path.splitext(path) # Split the pathname path into a pair (root, ext) such that `root + ext == path`
>>> os.path.splitext('/Users/sparkandshine/Documents/main.py')
('/Users/sparkandshine/Documents/main', '.py')
os.path.join(path, *paths)
joins one or more path components intelligently.
The return value is the concatenation of
path
and any members of*paths
with exactly one directory separator (os.sep
) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
>>> os.path.join('/Users/sparkandshine', 'Documents', 'main.py')
'/Users/sparkandshine/Documents/main.py'
# the last part is empty
>>> os.path.join('/Users/sparkandshine', 'Documents', '')
'/Users/sparkandshine/Documents/'
# a component is an absolute path
>>> os.path.join('/Users/sparkandshine', '/home/sparkandshine', 'Documents', 'main.py')
'/home/sparkandshine/Documents/main.py'
os.path.join(head, rail)
can be regarded as the inverse operation of os.path.split(path)
. In all cases, join(head, tail)
returns a path to the same location as path
(but the strings may differ).
2.2 Basic use
Some of the most commonly used functions are listed below.
os.path.basename(path) # Return the base name of pathname path, the second element returned by `split(path)`.
os.path.dirname(path) # Return the directory name of pathname path, the first element returned by `split(path)`.
os.path.exists(path) # Return True if path refers to an existing path or an open file descriptor.
os.path.abspath(path) # Return a normalized absolutized version of the pathname path.
os.path.isabs(path) # Return True if path is an absolute pathname.
os.path.normpath(path) # Normalize a pathname by collapsing redundant separators and up-level references.
os.path.getatime(path) # Return the time of last access of path.
os.path.getmtime(path) # Return the time of last modification of path.
os.path.getctime(path) # Return the system’s ctime which, on some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time for path.
os.path.getsize(path) # Return the size, in bytes, of path.
os.path.isfile(path) # Return True if path is an existing regular file.
os.path.isdir(path) # Return True if path is an existing directory.
os.path.islink(path) # Return True if path refers to a directory entry that is a symbolic link.
os.path.ismount(path) # Return True if pathname path is a mount point.
os.path.samefile(path1, path2) # Return True if both pathname arguments refer to the same file or directory.
os.path.sameopenfile(fp1, fp2) # Return True if the file descriptors fp1 and fp2 refer to the same file.
os.path.samestat(stat1, stat2) # Return True if the stat tuples stat1 and stat2 refer to the same file.
2.3 Create a directory if doesn't exist
Create a directory if it doesn't exist.
subdir = 'msg_events_arbitrary'
if not os.path.exists(subdir):
os.makedirs(subdir)
os.mkdir
creates a directory with a numeric mode. Further, os.makedirs is a recursive directory creation function. Like os.mkdir()
, but makes all intermediate-level directories needed to contain the leaf directory.
# Python2
os.mkdir(path[, mode])
os.makedirs(path[, mode])
# Python3
os.mkdir(path, mode=0o777, *, dir_fd=None)
os.makedirs(name, mode=0o777, exist_ok=False) # If exist_ok is False (the default), an OSError is raised if the target directory already exists.
3. glob
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. *
, ?
, and character ranges expressed with []
will be correctly matched. glob treats filenames beginning with a dot .
as special cases.
If recursive
is true, the pattern “**”
will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep
(such as /
), only directories and subdirectories match. (New in Python 3.5+)
glob.glob(pathname, *, recursive=False) # Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification.
glob.iglob(pathname, recursive=False) # Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
glob.escape(pathname) # Escape all special characters ('?', '*' and '['). This is useful to match a string containing special characters.
4. pathlib
pathlib offers classes representing object-oriented filesystem paths. It is new since Python 3.4.
4.1 Pure paths
Pure path objects provide path-handling operations which don’t actually access a filesystem.
import pathlib
class pathlib.PurePath(*pathsegments)
class pathlib.PurePosixPath(*pathsegments)
class pathlib.PureWindowsPath(*pathsegments)
# Examples
>>> p = pathlib.PurePath('subdir', 'subdir_main.py')
>>> p
PurePosixPath('subdir/subdir_main.py')
>>> p.parts
('subdir', 'subdir_main.py')
>>> p.parent
PurePosixPath('subdir')
>>> p.suffix
'.py'
>>> p.stem
'subdir_main'
4.2 Concrete paths
Concrete paths are subclasses of the pure path classes. In addition to operations provided by the latter, they also provide methods to do system calls on path objects.
import pathlib
class pathlib.Path(*pathsegments)
class pathlib.PosixPath(*pathsegments)
class pathlib.WindowsPath(*pathsegments)
# Examples
from pathlib import Path # Import the main class
p = Path('.') # Create an instance
subdirectories = [x for x in p.iterdir() if x.is_dir()] # # Listing subdirectories
q = Path('stackoverflow.py')
>>> q.exists()
True
>>> q.cwd()
PosixPath('/Users/sparkandshine/git/tmp')
>>> q.home()
PosixPath('/Users/sparkandshine')
>>> q.stat()
os.stat_result(st_mode=33252, st_ino=4554446, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=160, st_atime=1485886672, st_mtime=1485886672, st_ctime=1485886672)
with q.open() as f: # open a file
lines = f.readline()
5. Find all files ending with an extension
Use glob
import glob
files = [pathname for pathname in glob.glob('*.py')]
['main.py', 'stackoverflow.py']
files = [pathname for pathname in glob.glob('/Users/sparkandshine/git/tmp/*.py')]
# ['/Users/sparkandshine/git/tmp/main.py', '/Users/sparkandshine/git/tmp/stackoverflow.py']
files = [pathname for pathname in glob.glob('*.py')]
['main.py', 'stackoverflow.py']
files = [pathname for pathname in glob.glob('**/*.py', recursive=True)] # Python 3.5+
# ['main.py', 'stackoverflow.py', 'subdir/subdir_main.py']
Use os.walk
os.walk
generates the file names in a directory tree by walking the tree. It yields a 3-tuple (dirpath, dirnames, filenames)
.
import os
os.walk(top, topdown=True, onerror=None, followlinks=False)
for root, dirs, files in os.walk("."):
for file in files:
if file.endswith(".py"):
print(os.path.join(root, file))
# Output
./main.py
./stackoverflow.py
./subdir/subdir_main.py
Use pathlib
>>> from pathlib import Path
>>> p = Path('.')
>>> list(p.glob('**/*.py'))
[PosixPath('main.py'), PosixPath('stackoverflow.py'), PosixPath('subdir/subdir_main.py')]
Use os.listdir
os.listdir(path='.')
returns a list containing the names of the entries in the directory given by path
.
import os
# os.listdir(path='.') returns a list containing the names of the entries in the directory given by path.
files = [file for file in os.listdir('.') if file.endswith(".py")]
# ['main.py', 'stackoverflow.py']
References:
[1] StackOverflow: Find all files in directory with extension .txt in Python